\chapter{Memory allocator}
\label{app-memory-allocator}

\section{Memory is divided into chunks}
\label{sec-memory-allocator-memory-divided-into-chunks}

Available memory is divided into consecutive \emph{chunks} with no
space in between two chunks.  If there are two consecutive chunks $C1$
and $C2$ somewhere in available memory, then $C1$ is said to be
the \emph{preceding} chunk with respect to $C2$ and $C2$ is said to be
the \emph{following} chunk with respect to $C1$.

A chunk can either be \emph{in use} or \emph{free}. When a chunk is
freed, and either the preceding chunk is free, the following chunk is
free, or both the preceding and the following chunks are free, then the
free chunks are coalesced into a single free chunk.  As a consequence,
there are never two consecutive free chunks in memory.

Every chunk $C$ has an initial 64-bit word containing the \emph{size}
of the chunk in 8-bit bytes.  Since this value is always a multiple of
$4$%
\fixme{There is a question concerning whether it would be better to
align user data on cache lines.},
the last two bits are always $0$.  We use the \emph{next to last}
bit to indicate whether $C$ is in use or free.  If the bit is $1$, the
chunk $C$ is in use.  If the bit is $0$, the chunk $C$ is free.  We
use \emph{the last} bit to indicate whether the \emph{preceding chunk}
with respect to $C$ is in use or free.  If the bit is $1$, the
preceding chunk is in use.  If the bit is $0$, the preceding chunk is
free.  If there is no preceding chunk, i.e. $C$ is the first chunk in
memory, then the last bit of the first word is set to $1$ to avoid any
attempt to coalesce $C$ with some non-existing preceding chunk.
Notice that for all chunks except the last one, there are therefore
two bits in memory indicating whether the chunk is in use or free; one
bit in the first word of the chunk itself, and one bit in the
following chunk.

To obtain the size of the chunk in bytes, the last two bits of
the word must first be masked out, using the \texttt{and} operation
with a mask containing a $1$ in every position except the two least
significant ones.%
\fixme{There is a suggestion to store the size multiplied by 4 so that
  it can be obtained by shifting right by two bits.  This suggestion
  would avoid having a 64-bit mask.  It is quite unlikely that storing
  sizes this way would not be enough, at least for the foreseeable future.}

A free chunk is linked into a doubly linked list of chunks in the same
bin.  See \refSec{sec-memory-allocator-bins-of-chunks} for more
details about the available bins.  If there are two consecutive chunks
$C1$ and $C2$ in the linked list in a bin, then $C1$ is said to be the
\emph{previous} chunk with respect to $C2$ and $C2$ is said to be the
\emph{next} chunk with respect to $C1$.

The second 64-bit word of a free chunk is a pointer to the previous
chunk in the bin.  The third 64-bit word of a free chunk is a pointer
to the next chunk in the bin.  These words do not point to the
beginning of the previous or the next chunk.  Instead, the second word
contains the address of the third word of the previous chunk in the
bin, and the third word contains the address of the second word of the
next chunk in the bin.  If there is no previous chunk in the bin, then
the second word contains the address of a \emph{sentinel} that is the
beginning of the bin.  Similarly, if there is no next chunk in the
bin, then the third word contains the address of a sentinel that is
the end of the bin.  By using these sentinels, we are able to simplify
the algorithms for linking and unlinking a chunk.  The last word of a
\emph{free chunk} contains the size of the chunk, just like the first
word.  The last word of a chunk in use is reserved for user data.

Since a free chunk has at least four words in it, this is also the
minimum size allowed for any chunk.

\refFig{fig-chunks2} shows the constellation of two chunks where the
first chunk is in use and the second chunk is free.

\begin{figure}
\begin{center}
\inputfig{fig-chunks2.pdf_t}
\end{center}
\caption{\label{fig-chunks2}
Chunk in use followed by free chunk.}
\end{figure}

As \refFig{fig-chunks2} shows, the next-to-last bit of the first word
of the first chunk and the last bit of the first word of the second
chunk are both $1$, indicating that the first chunk is in use.  The
next-to-last bit of the first word of the second chunk is $0$
indicating that the second chunk is free.  The first chunk contains
the size word followed by user data.  The second chunk contains the
size in the first and the last word, and the second and third words of
the second chunk contain links to the previous and the next chunk in
the bin.

\refFig{fig-chunks3} shows the constellation of two chunks where the
first chunk is free and the second chunk is in use.

\begin{figure}
\begin{center}
\inputfig{fig-chunks3.pdf_t}
\end{center}
\caption{\label{fig-chunks3}
Free chunk followed by chunk in use.}
\end{figure}

As \refFig{fig-chunks3} shows, the next-to-last bit of the first word
of the first chunk and the last bit of the first word of the second
chunk are both $0$, indicating that the first chunk is free.  The
next-to-last bit of the first word of the second chunk is $1$
indicating that the second chunk is in use.  The first chunk contains
the size in the first and the last word, and the second and third
words of the first chunk contain links to the previous and the next
chunk in the bin.  The second chunk contains the size word followed by
user data.

\refFig{fig-chunks4} shows the constellation of two chunks where both
chunks are in use.

\begin{figure}
\begin{center}
\inputfig{fig-chunks4.pdf_t}
\end{center}
\caption{\label{fig-chunks4}
Two chunks in use.}
\end{figure}

As \refFig{fig-chunks4} shows, the next-to-last bit of the first word
of the first chunk and the last bit of the first word of the second
chunk are both $1$, indicating that the first chunk is in use.  The
next-to-last bit of the first word of the second chunk is also $1$
indicating that the second chunk is in use as well.  Each of the
chunks contains the size in the first word and the remaining words of
the chunk contains user data.

\section{Bins of chunks of similar size}
\label{sec-memory-allocator-bins-of-chunks}

As indicated in
\refSec{sec-memory-allocator-memory-divided-into-chunks}, we maintain
a number of \emph{bins} containing chunks of similar size.  There are
$512$ bins in total, numbered from $0$ to $511$.  Each of the bins
from $0$ to $63$ contains chunks of a single size.  Bin $0$
contains chunks with $4$ words (the minimum size) and bin $63$
contains chunks with $67$ words.  Bins starting at $64$ contain chunks
with a size greater than the maximum size of chunks in the previous
bin and less than or equal to some maximum size that is indicated
explicitly.  The maximum size of chunks in bins $64$ to $511$ grows by
roughly less than 10\% compared to the previous one.  Thus the maximum
size of chunks in bin $64$ is $73$, that of chunks in bin $65$ is
$79$, etc.  The maximum chunk size of bin $511$ is $2^{61}$ words, or
$2^{64}$ bytes.  In bins $0$ to $63$, chunks are sorted by address.
In bins $64$ to $511$ chunks are sorted first by size and then (if
several chunks have the same size) by address.%
\fixme{There is a question concerning whether odd-sized chunks are
  needed.  I think the answer is that they are not strictly needed,
  but that more space would be wasted without them.}

Three vectors of $512$ elements each are used to manage the bins.  One
vector contains the maximum size%
\footnote{In the implementation, the sizes in this vector are given in
  number of 8-bit bytes, rather than in number of 64-bit words, so as
  to avoid unnecessary arithmetic operations in the allocator
  algorithms.}  of chunks in the bin that corresponds to the index.
The second vector contains the first sentinel of each bin.  The third
vector contains the last sentinel of each bin.  When the bin is empty,
the element in the second vector contains the address of the element
in the third vector and vice versa.  In addition to these three
vectors, we also maintain a bitmap%
\footnote{This bitmap is not yet implemented.}  consisting of $8$
$64$-bit words, containing a $1$ if the corresponding bin has at least
one chunk in it and $0$ if the corresponding bin contains no chunks.
\refFig{fig-bins} illustrates the organization of the bins.

\begin{figure}
\begin{center}
\inputfig{fig-bins.pdf_t}
\end{center}
\caption{\label{fig-bins}
Organization of bins.}
\end{figure}

In the example in \refFig{fig-bins}, bin $0$ contains two chunks, bin
$63$ a single chunk, and bin $66$ contains three chunks.  Bins $64$,
$65$, and $511$ contain no chunks.

\section{Linking a chunk into a bin}

To link an arbitrary chunk of $n$ words into a bin, we first determine
which bin it should be linked into.  If $n \le 67$ then the chunk goes
into bin $n-4$.  If not, we do a binary search to find the bin with
the smallest maximum chunk size that is greater than or equal to $n$.
We then do a linear search of the chunks in the bin, ending either
when we find the last sentinel, or when we find the first chunk that
should be located after the one to be inserted.  Finally, the chunk to
be inserted is linked in using the normal insertion procedure for
doubly linked lists.

\section{Allocating a chunk}

To allocate a chunk with at least $n$ words in it, we first determine
the bin $b$ with the smallest possible maximum chunk size that is
greater than or equal to $n$.  If $n \le 67$ then the $b$ is computed
as $n-4$.  If not, we do a binary search to find $b$.

It is possible that $b$ is empty.  We must therefore find the smallest
bin $b'$ such that $b' \ge b$ and $b'$ is not empty.  We first compute
$q = b \thinspace div \thinspace 64$ and $r = b \thinspace
mod \thinspace 64$ indicating that $b$ corresponds to position $r$ in
bitmap $q$.  We then compute a mask $m$ consisting of $r$ leading $0$s
and $64-r$ trailing $1$s.  This mask is \texttt{and}-ed with the bitmap.  We
then find the first $1$ in the resulting value.  If such a position
exists, then we have found $b'$.  If not, we search bitmaps $q+1$,
$q+2$ etc., without any mask, until a set bit is found.  The first
such bit found corresponds to $b'$.  If no such bit is found, we
request more memory from the operating system.

When a non-empty bin is found, if the bin index is less than or equal
to $63$ we use the first chunk in the bin.  If the bin index is
greater than or equal to $64$, we do a linear search of the chunks in
the bin, and use the first chunk that is greater than or equal to $n$
words.

Once we find a chunk $c$ that is greater than or equal to $n$ words,
there are two cases:

\begin{enumerate}
\item If the size of $c$ is less than or equal to $n-4$, then we
  unlink the chunk from the bin and return it.  The reason for the $4$
  is that we can not use a residue less than $4$ words.
\item If the size $s$ of $c$ is strictly greater than $n$, then we
  unlink $c$ from its bin.  We then split $c$ into a chunk $c1$ of
  size $n$ and a chunk $c2$ of size $s-n$.  The chunk $c2$ is then
  linked into the bin that corresponds to its size.  The free/used
  bits are updated to reflect the fact that $c1$ is now used.
  Finally, the chunk $c1$ is returned.
\end{enumerate}

\section{Freeing a chunk}

When a chunk $C$ is freed, there are four possible situations (recall
that ``preceding'' and ``following'' refer to the order between chunks
in the address space):

\begin{enumerate}
\item The chunk $P$ preceding $C$ and the chunk $F$ following $C$
  are both in use.  Then, we just link $C$ into an appropriate
  bin.
\item The chunk $P$ preceding $C$ is in use but the chunk $F$
  following $C$ is free.  Then we first coalesce $C$ with $F$, and
  then we link the resulting chunk into an appropriate bin.
\item The chunk $P$ preceding $C$ is free but the chunk $F$ following
  $C$ is in use.  Then we first coalesce $C$ with the $P$, and then we
  link the resulting chunk into an appropriate bin.
\item The chunk $P$ preceding $C$ and the chunk $F$ following $C$ are
  both free.  Then we first coalesce $C$ both with $P$ and $F$, and
  then we link the resulting chunk into an appropriate bin.
\end{enumerate}

To determine whether the chunk $P$ preceding $C$ is in use or free, we
consult the last bit of the first word $C$.  If it is $1$, then
$P$ is in use.  If it is $0$, $P$ is free.  Only when this bit is $0$ is
it possible to find the beginning of $P$, because only then does the
last word of $P$ contain the size of $P$.  And that size is needed to
find the beginning of $P$.  This is the main reason for storing the
in-use bit in the chunk following the one that is concerned.

To determine whether the chunk $F$ following $C$ is in use or free,
use the size of $C$ to find the beginning of $F$.  We then consult the
next-to-last bit of the first word of $F$.  If it is $1$, then $F$ is
in use.  If it is $0$, then $F$ is free.

To coalesce the chunk $P$ preceding $C$ (case 2 above) with $C$, $P$
must first be unlinked from its bin.  Then the size contained in the
first word of $P$ is modified to contain the sum of the initial sizes
of $P$ and $C$.  This new size is then stored in the last word of $C$
as well.  Finally, the resulting chunk is linked into the bin
corresponding to the new size.

To coalesce the chunk $C$ with the chunk $F$ following $C$ (case 3
above), $F$ must first be unlinked from its bin.  Then the size
contained in the first word of $C$ is modified to contain the sum of
the initial sizes of $C$ and $F$.  This new size is then stored in the
last word of $F$ as well.  Finally, the resulting chunk is linked into
the bin corresponding to the new size.

To coalesce the chunk $P$ preceding $C$, $C$ itself, and the chunk $F$
following $C$ (case 4 above), both $P$ and $F$ must first be unlinked
from their respective bins.  Then the size contained in the first word
of $P$ is modified to contain the sum of the three sizes.  This new
size is then stored in the last word of $F$ as well.  Finally, the
resulting chunk is linked into the bin corresponding to the new size.

\section{Concurrency}

To be filled in.  Talk about what kind of synchronization is required.

